{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Requirement\r\n",
    "\r\n",
    "Python 3.7, numpy>=1.17.4, scipy>=1.3.2\r\n",
    "\r\n",
    "cython>=0.29.13 (Not required but highly recommended)\r\n",
    "\r\n",
    "Run the following code in bash/terminal to compile (Not required but highly recommended).\r\n",
    "```bash\r\n",
    "# The command below is not required but strongly recommended, as it will compile the cython code to run faster\r\n",
    "python setup.py build_ext --inplace\r\n",
    "```"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\r\n",
    "# Spectral entropy\r\n",
    "\r\n",
    "To calculate spectral entropy, the spectrum need to be centroid first. When you are focusing on fragment ion's\r\n",
    "information, the precursor ion may need to be removed from the spectrum before calculating spectral entropy.\r\n",
    "\r\n",
    "Calculate spectral entropy for **centroid** spectrum with python is very simple (just one line with scipy package)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "import numpy as np\r\n",
    "import scipy.stats\r\n",
    "\r\n",
    "spectrum = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)\r\n",
    "\r\n",
    "entropy = scipy.stats.entropy(spectrum[:, 1])\r\n",
    "print(\"Spectral entropy is {}.\".format(entropy))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spectral entropy is 0.3737888038158417.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "\r\n",
    "For **profile** spectrum which haven't been centroid, you can use a ```clean_spectrum``` to centroid the spectrum.\r\n",
    "For example:"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "import numpy as np\r\n",
    "import scipy.stats\r\n",
    "import spectral_entropy\r\n",
    "\r\n",
    "spectrum = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)\r\n",
    "\r\n",
    "spectrum = spectral_entropy.clean_spectrum(spectrum)\r\n",
    "entropy = scipy.stats.entropy(spectrum[:, 1])\r\n",
    "print(\"Spectral entropy is {}.\".format(entropy))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spectral entropy is 0.2605222463607788.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "We provide a function  ```clean_spectrum``` to help you remove precursor ion, centroid spectrum and remove noise ions.\r\n",
    "For example:"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "import numpy as np\r\n",
    "import spectral_entropy\r\n",
    "\r\n",
    "spectrum = np.array([[41.04, 0.3716], [69.071, 7.917962], [69.071, 100.], [86.0969, 66.83]], dtype=np.float32)\r\n",
    "clean_spectrum = spectral_entropy.clean_spectrum(spectrum,\r\n",
    "                                                 max_mz=85,\r\n",
    "                                                 noise_removal=0.01,\r\n",
    "                                                 ms2_da=0.05)\r\n",
    "print(\"Clean spectrum will be:{}\".format(clean_spectrum))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Clean spectrum will be:[[69.071  1.   ]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Entropy similarity\r\n",
    "\r\n",
    "Before calculate entropy similarity, the spectrum need to be centroid first. Remove the noise ions is highly recommend.\r\n",
    "Also, base on our test on NIST20 and Massbank.us database, remove ions have m/z higher than precursor ion's m/z - 1.6\r\n",
    "will greatly improve the spectral identification performance.\r\n",
    "\r\n",
    "We provide ```calculate_entropy_similarity``` function to calculate two spectral entropy."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "import numpy as np\r\n",
    "import spectral_entropy\r\n",
    "\r\n",
    "spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)\r\n",
    "spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)\r\n",
    "\r\n",
    "# Calculate entropy similarity.\r\n",
    "similarity = spectral_entropy.calculate_entropy_similarity(spec_query, spec_reference, ms2_da=0.05)\r\n",
    "print(\"Entropy similarity:{}.\".format(similarity))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entropy similarity:0.8984398591079145.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Spectral similarity\r\n",
    "We also provide 44 different spectral similarity algorithm for MS/MS spectral comparison\r\n",
    "\r\n",
    "You can find the detail reference\r\n",
    "here: [https://SpectralEntropy.readthedocs.io/en/master/](https://SpectralEntropy.readthedocs.io/en/master/)\r\n",
    "\r\n",
    "# Example code\r\n",
    "\r\n",
    "Before calculating spectral similarity, it's highly recommended to remove spectral noise. For example, peaks have\r\n",
    "intensity less than 1% maximum intensity can be removed to improve identificaiton performance."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "import numpy as np\r\n",
    "import spectral_entropy\r\n",
    "\r\n",
    "spec_query = np.array([[69.071, 7.917962], [86.066, 1.021589], [86.0969, 100.0]], dtype=np.float32)\r\n",
    "spec_reference = np.array([[41.04, 37.16], [69.07, 66.83], [86.1, 999.0]], dtype=np.float32)\r\n",
    "\r\n",
    "# Calculate entropy similarity.\r\n",
    "similarity = spectral_entropy.similarity(spec_query, spec_reference, method=\"entropy\", ms2_da=0.05)\r\n",
    "print(\"Entropy similarity:{}.\".format(similarity))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Entropy similarity:0.8984398591079145.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "similarity = spectral_entropy.similarity(spec_query, spec_reference, method=\"unweighted_entropy\", ms2_da=0.05)\r\n",
    "print(\"Unweighted entropy similarity:{}.\".format(similarity))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unweighted entropy similarity:0.9826668790176113.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "all_dist = spectral_entropy.all_similarity(spec_query, spec_reference, ms2_da=0.05)\r\n",
    "for dist_name in all_dist:\r\n",
    "    method_name = spectral_entropy.methods_name[dist_name]\r\n",
    "    print(\"Method name: {}, similarity score:{}.\".format(method_name, all_dist[dist_name]))"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Method name: Entropy distance, similarity score:0.8984398591079145.\n",
      "Method name: Unweighted entropy distance, similarity score:0.9826668790176113.\n",
      "Method name: Euclidean distance, similarity score:0.9704388194862964.\n",
      "Method name: Manhattan distance, similarity score:0.9663097634911537.\n",
      "Method name: Chebyshev distance, similarity score:0.9663097560405731.\n",
      "Method name: Squared Euclidean distance, similarity score:0.9991261365939863.\n",
      "Method name: Fidelity distance, similarity score:0.9828163981437683.\n",
      "Method name: Matusita distance, similarity score:0.8689137443332198.\n",
      "Method name: Squared-chord distance, similarity score:0.982816394418478.\n",
      "Method name: Bhattacharya 1 distance, similarity score:0.9860314218260302.\n",
      "Method name: Bhattacharya 2 distance, similarity score:0.9829623601324312.\n",
      "Method name: Harmonic mean distance, similarity score:0.9824790358543396.\n",
      "Method name: Probabilistic symmetric χ2 distance, similarity score:0.9824790470302105.\n",
      "Method name: Ruzicka distance, similarity score:0.9348156005144119.\n",
      "Method name: Roberts distance, similarity score:0.9507221579551697.\n",
      "Method name: Intersection distance, similarity score:0.9663097858428955.\n",
      "Method name: Motyka distance, similarity score:0.9663097858428955.\n",
      "Method name: Canberra distance, similarity score:0.475620517035965.\n",
      "Method name: Baroni-Urbani-Buser distance, similarity score:0.9711240530014038.\n",
      "Method name: Penrose size distance, similarity score:0.9129998942501335.\n",
      "Method name: Mean character distance, similarity score:0.9831548817455769.\n",
      "Method name: Lorentzian distance, similarity score:0.9376263842666843.\n",
      "Method name: Penrose shape distance, similarity score:0.9704388379255426.\n",
      "Method name: Clark distance, similarity score:0.5847746606268357.\n",
      "Method name: Hellinger distance, similarity score:0.6877124408992461.\n",
      "Method name: Whittaker index of association distance, similarity score:0.9082068549409137.\n",
      "Method name: Symmetric χ2 distance, similarity score:0.9235780252817392.\n",
      "Method name: Pearson/Spearman Correlation Coefficient, similarity score:0.9995291233062744.\n",
      "Method name: Improved Similarity, similarity score:0.5847746606268357.\n",
      "Method name: Absolute Value Distance, similarity score:0.9663097634911537.\n",
      "Method name: Dot product distance, similarity score:0.9992468165696725.\n",
      "Method name: Cosine distance, similarity score:0.9992468165696725.\n",
      "Method name: Reverse dot product distance, similarity score:0.9992468165696725.\n",
      "Method name: Spectral Contrast Angle, similarity score:0.9992467761039734.\n",
      "Method name: Wave Hedges distance, similarity score:0.4566912449792375.\n",
      "Method name: Jaccard distance, similarity score:0.997934231068939.\n",
      "Method name: Dice distance, similarity score:0.9989660476567224.\n",
      "Method name: Inner product distance, similarity score:0.8442940711975098.\n",
      "Method name: Divergence distance, similarity score:0.331483304773883.\n",
      "Method name: Avg (L1, L∞) distance, similarity score:0.9326195220152537.\n",
      "Method name: Vicis-Symmetric χ2 3 distance, similarity score:0.981897447258234.\n",
      "Method name: MSforID distance version 1, similarity score:0.8395898139303545.\n",
      "Method name: MSforID distance, similarity score:0.6301550967406659.\n",
      "Method name: Weighted dot product distance, similarity score:0.9998376420729537.\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Supported similarity algorithm list:\r\n",
    "\r\n",
    "    \"entropy\": Entropy distance\r\n",
    "    \"unweighted_entropy\": Unweighted entropy distance\r\n",
    "    \"euclidean\": Euclidean distance\r\n",
    "    \"manhattan\": Manhattan distance\r\n",
    "    \"chebyshev\": Chebyshev distance\r\n",
    "    \"squared_euclidean\": Squared Euclidean distance\r\n",
    "    \"fidelity\": Fidelity distance\r\n",
    "    \"matusita\": Matusita distance\r\n",
    "    \"squared_chord\": Squared-chord distance\r\n",
    "    \"bhattacharya_1\": Bhattacharya 1 distance\r\n",
    "    \"bhattacharya_2\": Bhattacharya 2 distance\r\n",
    "    \"harmonic_mean\": Harmonic mean distance\r\n",
    "    \"probabilistic_symmetric_chi_squared\": Probabilistic symmetric χ2 distance\r\n",
    "    \"ruzicka\": Ruzicka distance\r\n",
    "    \"roberts\": Roberts distance\r\n",
    "    \"intersection\": Intersection distance\r\n",
    "    \"motyka\": Motyka distance\r\n",
    "    \"canberra\": Canberra distance\r\n",
    "    \"baroni_urbani_buser\": Baroni-Urbani-Buser distance\r\n",
    "    \"penrose_size\": Penrose size distance\r\n",
    "    \"mean_character\": Mean character distance\r\n",
    "    \"lorentzian\": Lorentzian distance\r\n",
    "    \"penrose_shape\": Penrose shape distance\r\n",
    "    \"clark\": Clark distance\r\n",
    "    \"hellinger\": Hellinger distance\r\n",
    "    \"whittaker_index_of_association\": Whittaker index of association distance\r\n",
    "    \"symmetric_chi_squared\": Symmetric χ2 distance\r\n",
    "    \"pearson_correlation\": Pearson/Spearman Correlation Coefficient\r\n",
    "    \"improved_similarity\": Improved Similarity\r\n",
    "    \"absolute_value\": Absolute Value Distance\r\n",
    "    \"dot_product\": Dot-Product (cosine)\r\n",
    "    \"dot_product_reverse\": Reverse dot-Product (cosine)\r\n",
    "    \"spectral_contrast_angle\": Spectral Contrast Angle\r\n",
    "    \"wave_hedges\": Wave Hedges distance\r\n",
    "    \"cosine\": Cosine distance\r\n",
    "    \"jaccard\": Jaccard distance\r\n",
    "    \"dice\": Dice distance\r\n",
    "    \"inner_product\": Inner Product distance\r\n",
    "    \"divergence\": Divergence distance\r\n",
    "    \"avg_l\": Avg (L1, L∞) distance\r\n",
    "    \"vicis_symmetric_chi_squared_3\": Vicis-Symmetric χ2 3 distance\r\n",
    "    \"ms_for_id_v1\": MSforID distance version 1\r\n",
    "    \"ms_for_id\": MSforID distance\r\n",
    "    \"weighted_dot_product\": Weighted dot product distance\"\r\n"
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.8.10 64-bit ('venv': venv)"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.10",
   "mimetype": "text/x-python",
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "pygments_lexer": "ipython3",
   "nbconvert_exporter": "python",
   "file_extension": ".py"
  },
  "interpreter": {
   "hash": "d0ee9955a1c49c3d1550e85906db5b8c92bf1d91ee05f2781fcf144369e863a2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}